--On Donnerstag, Februar 16, 2006 10:39:45 -0800 Dann Corbit
<DCorbit@connx.com> wrote:
> He refers to counting sort and radix sort (which comes in most
> significant digit and least significant digit format). These are also
> called distribution (as opposed to comparison) sorts.
>
> These sorts are O(n) as a function of the data size, but really they are
> O(M*n) where M is the average key length and n is the data set size.
> (In the case of MSD radix sort M is the average length to completely
> differentiate strings)
>
> So if you have an 80 character key, then 80*log(n) will only be faster
I suppose you meant 80 * n here?
> than n*log(n) when you have 2^80th elements -- in other words -- never.
I think this is wrong. You can easily improve Radix sort by a constant if
you don't take single bytes as the digits but rather k-byte values. At
least 2 byte should be possible without problems. This would give you 40 *
n time, not 80 * n. And you assume that the comparision of an 80-byte wide
value only costs 1, which might (and in many cases will be imho) wrong.
Actually it migh mean to compare 80 bytes as well, but I might be wrong.
What I think as the biggest problem is the digit representation necessary
for Radix-Sort in cases of locales which sort without looking at spaces. I
assume that would be hard to implement. The same goes for the proposed
mapping of string values onto 4/8-byte values.
Mit freundlichem Gruß
Jens Schicke
--
Jens Schicke j.schicke@asco.de
asco GmbH http://www.asco.de
Mittelweg 7 Tel 0531/3906-127
38106 Braunschweig Fax 0531/3906-400